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PkANN — I. Non-linear matter power spectrum 
interpolation through artificial neural networks 
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ABSTRACT 

We investigate the interpolation of power spectra of matter fluctuations using Arti- 
ficial Neural Network (PkANN). We present a new approach to confront small-scale 
non-linearities in the power spectrum of matter fluctuations. This ever-present and 
pernicious uncertainty is often the Achilles' heel in cosmological studies and must be 
reduced if we are to see the advent of precision cosmology in the late-time Universe. We 
show that an optimally trained artificial neural network (ANN) , when presented with 
a set of cosmological parameters (r2i„/i^, i7b/i^, rig, wojCg, ^ttIj^ and redshift z), can 
provide a worst-case error < 1 per cent (for z < 2) fit to the non-linear matter power 
spectrum deduced through A^-body simulations, for modes up to fc < 0.7/iMpc~^. 
Our power spectrum interpolator is accurate over the entire parameter space. This 
is a significant improvement over some of the current matter power spectrum calcu- 
lators. In this paper, we detail how an accurate interpolation of the matter power 
spectrum is achievable with only a sparsely sampled grid of cosmological parameters. 
Unlike large-scale A^-body simulations which are computationally expensive and/or 
infeasible, a well-trained ANN can be an extremely quick and reliable tool in inter- 
preting cosmological observations and parameter estimation. This paper is the first in 
a series. In this method paper, we generate the non-linear matter power spectra using 
HALOFIT and use them as mock observations to train the ANN. This work sets the 
foundation for Paper II, where a suite of A'^-body simulations will be used to compute 
the non-linear matter power spectra at sub-per cent accuracy, in the quasi-non-lincar 
regime (0.1 /iMpc"^ < fc < 0.9/iMpc"^). A trained ANN based on this A^-body suite 
will be released for the scientific community. 
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1 INTRODUCTION 

Studying the growth of structure and the distribution of 
galaxies in our Universe is a potent method for understand- 
ing both fundamental physics and the cosmological model. 
Measurements of the large scale structure are capable of 



and the Baryon Oscillation Spectroscopic Survey (BOSS) 
(Eisenstein et al. 20111. These surveys promise to achieve 



testing our theory of gravity (Amendola, Kunz, & Sapone 



2008 1 , distinguishing between dark energy models and prob- 



ing the absolute neutrino mass scale (Thomas, Abdalla, & 



Lahav 20101. It is also an immense complement to the in- 



crementally cosmic variance limited cosmic microwave back- 
ground. This complementarity explains the overwhelmingly 
large number of galaxy surveys on the horizon that promise 
to refine or even alter our understanding of the cosmos, e.g. 



DES (The Dark Energy Survey Collaboration 20051, the 



Large Synoptic Survey Telescope (LSST) ( Ivezic et al. 2008 1 



high-precision measurements of galaxy power spectrum am- 
plitudes and offer a possibility to improve constraints on 
cosmological parameters including dark energy and neutrino 
masses. However, with this promise comes a great technical 
and systematic difficulty. 

Arguably the most ubiquitous problem in both galaxy 
clustering and weak lensing surveys is that as structures 
collapse they evolve from being linear, for which one can 
solve analytically, to non-linear, for which one cannot. Us- 
ing A'^-body simulations ( Heitmann et al.]|2010 Agarwal & 
Feldnian|2011[ ) and analytical studies inspired from pertur- 
bation theory (PT) ([Scoccimarro, Zaldarriaga, fc H ui"1999| 
Saito, Takada, fc Taruya||2008 l, the non- linear effects have 
been shown to be significant compared to the precision levels 
of future surveys. A consequence of this is the uncertainty in 
calculating the theoretical power spectra over smaller scales 
and at low redshift. There is frequently a choice to either ex- 
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elude - and therefore waste - the wealth of available and ex- 
pensively obtained data, or to use an inaccurate procedure, 
which may bias and invalidate any measurement determined 
with anticipated precision. 

At present there are several approaches to dealing 
with this fruitful yet frustrating regime. One is to use so- 
phisticated A'^-body simulations commonly produced with 
codes such as enzo (lO'Shea et al.l 120101) and gadget 



( Springell 2005 1 . The most popular non-linear prescription 



HALOFIT (Smi th et al.|20d3 l is a semi-analytical fit and has 
been fantastically successful. However, with larger and ever- 
improving state of the art of A''-body simulations, the non- 
linearities on smaller scales have been shown to be at levels 
higher than the ones that were used in calibrating halofit. 
As such, on smaller scales (A: > 0.1/iMpc~^) the matter 
power spectra predicted by halofit do not match the high 
precision A-body results well enough. If we are to perform 
precision cosmology it is imperative to go far beyond the 
levels of precision offered by current analytical approxima- 
tions. An obstacle to further progress in obtaining accurate 
fits to underlying spectra is the vast computational demand 
from detailed A^-body simulations and a high dimensionality 
in the cosmological parameter space. 



derstanding and physics. However, we view this direction as 
a pragmatic one: a new approach is urgent given the im- 
pending flood of new data from upcoming surveys, and in 
an age of supposed precision cosmology, we will be theory 
limited in this specific area. It is therefore crucial to strive 
towards per cent level precision in the determination of the 
non-linear power spectrum. 

Machine-learning techniques - in the form of Gaussian 
processes - have already been used as cosmological non- 



linear emulators in Habib et al. (20071, Schneider et al. 



(|2008[),|Heitmann et al.| (|2009|, [Lawrence et al.| ( |2010| and 
Schneider, Holm, fc Knox| (|2011[). Gaussian process model- 



ing (see MacKay 1997 Rasmussen & Williams 2006 for a 



basic introduction to Gaussian processes) is a non-linear in- 
terpolation scheme that, after optimal learning, is capable of 
making predictions when queried at a suitable input setting. 
There are several advantages and disadvantages when using 
neural networks and Gaussian processes to interpolate data. 
From a practical point of view, a neural network compresses 
data into a small number of weight parameters, so a large 
number of simulations could be fitted into a small number 
of files whereas a Gaussian process has to carry a large ma- 
trix which can be of the order of the number of points used 



There have been attempts (see Bird, Viel, & Haehnelt for training the Gaussian process. [Heitmann et al.| ( [2009 1 



2012 1 to calibrate halofit using A-body simulations to pre- 



dict suppression of matter power spectrum for cosmologi- 
cal models with massive neutrinos. However, semi-analytical 
fits like HALOFIT will themselves become obsolete with near- 
future surveys that promise to reach per cent level of preci- 
sion. Moreover, implementing neutrinos as particles in nu- 
merical simulations is a topic of ongoing research, with re- 



sults (see Brandbyge & Hanncstad 2009 Viel, Haehnelt, & 



[Springcl 2010) contradictory at a level (factor of ~ 5 or 
higher) that can not be justified as due to (non)-inclusion 
of baryonic physics. 

An alternative procedure to tackle small-scale non- 



linearities is to use higher order PT {e.g. Saito et al 



[Nishimichi et al.[[2009} [Saito, Takada, fc Taruya[ 



2008 



2009 



to push further into the quasi-linear domain. Using high- 
resolution A^-body simulations as reference, [Garlson, White^ 
[fc Padmanabhaii] ([2009) have shown that although PT im- 
proves upon a linear description of the power spectrum at 
large scales (k ~ 0.04 hMpc~^), it expectedly fails on smaller 
scales {k ^ 0.08 /iMpc~^). The range of scales where PT is 
reliable at the per cent level is both redshift and cosmo- 
logical model dependent. For cosmologies close to WMAP 
best-fit parameters, [Taruya et al.[ ( |2009[ ) have shown that 
at redshift z = 0, the one-Loop standard perturbation se- 
quence to the non-linear matter power spectrum is expected 
to converge with the A-body simulation results to within 1 
per cent - only for scales k <, 0.09 hMpc~^ . With the mea- 
surements from surveys expected to be at 1 per cent level 
precision, these upcoming data sets create new challenges 
in analyses and need alternative ways to efficiently estimate 
cosmological parameters. 

One might argue that a machine-learning approach to 
determine the non-linear response from varying parameter 
settings is a rather black-box approach that goes against 
the traditional approach to spectra: based on scientific un- 



dealt with large matrices by using principal component anal- 
ysis (PCA) to reduce their sizes to ones easily manipulated. 
Again from a practical point of view, usually Gaussian pro- 
cesses can do better than neural networks in the case of a 
small number of training points given that a neural network 
could be flexible enough to be misused and misfit the data. 
From a theoretical point of view, the two methods should 
fare equally especially as there are certain kernels used in 
Gaussian processes which are equivalent to the interpolation 
and fit one would have with neural networks. Overall, given 
the implementation, we believe that the two methods should 
produce equivalent results especially if the artificial neural 
network (ANN) procedure is trained by a larger number of 
simulations. 

While using any machine-learning technique in lieu of 
TV-body simulation output, it is critical that (i) the queried 
input setting not lie outside the input parameter ranges that 
are used during machine learning and (ii) the input param- 
eter space must be sampled densely enough for the machine 
procedure to interpolate/predict accurately. Machine learn- 
ing has been used in the fitting of cosmological functions 



([Auld et al.|2007[[Fendt fc Wandelt|2007[ [Auld, Bridges, fc 



Hobson|2008 1 and photometric redshifts ( CoUister fc Lahav 
2004[ ) 

HALOFIT is accurate at the 5—10 per cent level at best 



(see Heitmann et al. 2010). A far more accurate matter 
power spectrum calculator is the COSMIC emulator (see 
Heitmann et al. 2009 Lawrence et al. 20101; although ac- 



curate at sub-per cent level, it makes predictions that are 
valid only for redshifts z < 1 and does not include cosmo- 
logical models with massive neutrinos. In order to extend 
the interpolation validity range to z < 2, as well as improve 
the accuracy levels, we work on a new technique to fit re- 
sults from cosmological A^-body simulations using an ANN 
procedure with an improved Latin hypercube sampling of 
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the cosmological parameter space. Using halofit spectra 
as mock A'^-body spectra, we show that the ANN formal- 
ism enables a remarkable fit with a manageable number of 
simulations. 

The outline of this paper is as follows. We discuss the 
concept of machine learning, in particular the details of neu- 
ral networks, in Section [2] The improved Latin hypercube 
sampling of the underlying parameter space, which keeps the 
simulation number manageable and fitting accuracy high, is 
detailed in Section [3] Our fitting results are included in Sec- 
tion [4] Finally, we conclude in Section [5] 



MACHINE LEARNING 
NETWORK 



THE NEURAL 



Machine learning is associated with a series of algorithms 
that allow a computational unit to evolve in its behavior, 
given access to empirical data. The major benefit is the po- 
tential to automatically learn complex patterns. As a subset 
of artificial intelligence, machine learning has been used in 
a variety of applications ranging from brain-machine inter- 
faces to the analyses of stock market. There exist a range 



of techniques (see e.g. Nilsson 2005 1 including genetic al- 
gorithms, decision tree learning and gaussian processes (as 
referenced above). In this work we focus on the neural net- 
work technique. 

An ANN is simply an interconnection of neurons or 
nodes analogous to the neural structure of the brain. This 
can take a more specific form whereby the nodes are ar- 
ranged in a series of layers with each node in a layer con- 
nected, with a weighting, to all other nodes in adjacent lay- 
ers. This is often referred to as a multi-layer perceptron 
(MLP). In this case one can impart values onto the nodes 
of the first layer (called the input layer), have a series of 
hidden layers and finally receive information from the last 
layer (called the output layer) . The configuration of nodes is 
often called the network's architecture and is specified from 
input to output as A^i : Ai^i : ... : N„ : No. That is, a network 
with an architecture 7 : 49 : 50 has seven inputs, a single 
hidden layer with 49 nodes and finally 50 outputs. An extra 
node (called the bias node) is added to the input layer and 
connects to all the nodes in the hidden layers and the output 
layer (see Bishop|[l995 for more details). The total number 
of weights Nw for a generic architecture Ni : Ni : ... : Nn : 
No can be calculated using the formula 

n n 

Nw = N,-Ni+Y, Ni^i ■Ni+Nr,-No + Y,Ni + No, (1) 

1=2 1=1 

where the summation index I is over the hidden layers only. 
For a network with a single hidden layer, the second term on 
the right-hand side is absent. Specifically, the architecture 7 
: 49 : 50 has a total of 7 x 49 + -H 49 x 50 + 49 4- 50 = 2892 
weights, collectively denoted by w. 

Each node (except the input nodes) is a neuron with an 
activation, Zj — g{a), taking as its argument 



= E^ 



where the sum is over all nodes i (of the previous layer) send- 
ing connections to the jth node (of the current layer). The 
activation functions are typically taken to be sigmoid func- 
tions such as g{a) — I/[I -|-exp(— a)]. The sigmoid functions 
impart some degree of non-linearity to the neural network 
models. A network becomes overly non-linear if the weights 
w deviate significantly from zero. This drives the activation 
of the nodes to saturation. The number and size of the hid- 
den layers add to the complexity of ANNs. For a particular 
input vector, the output vector of the network is determined 
by progressing sequentially through the network layers, from 
inputs to outputs, calculating the activation of each node. 

Adjusting the weights w to get the desired mapping is 
called the training of the network. For matter power spec- 
trum estimation, we use a training set of A^'-body simulations 
for which we have full information about the non-linear mat- 
ter power spectra Pni{k,z), as well as the underlying cos- 
mological parameters: I = {Q,inh^,i^bh^,ns,wo,o-s,^m,y), 
where /i, ilni, i^b,'T-s, Wojfs and '^ni,j are the present-day 
normalized Hubble parameter in units of 100 km s~^ Mpc~^, 
the present-day matter and baryonic normalized energy den- 
sities, the primordial spectral index, the constant equation 
of state parameter for dark energy, the amplitude of fluctu- 
ation on an 8h~^ Mpc scale and the total neutrino mass, 
respectively. 

Given the training set, the network can be used to 
learn some parametrization to arbitrary accuracy by train- 
ing the weights w. This is done by minimizing a suitable 
cost function. 



V-V-[^nr^(fc,^,W,I, 



t=i fe=i 



Pnl{k,Z,It)] 



a{k,z,It 



(3) 



with respect to the weights w. The sum t is over all the 
cosmologies It in the training set. The sum k is over all 
the nodes in the output layer. Note that each output node 
samples the matter power spectrum at some specific scale, 
k (/iMpc~^). Pni{k, z, I) is the true non-linear matter power 
spectrum for the specific cosmology I. In this paper, we 
use halofit's approximation for Pn\{k,z). In Paper II A''- 
body simulations will be used to calculate Pni{k,z). Given 
the weights w, P,j'^^(fc, 2, w,I) is the ANN's predicted 
power spectrum for the Ith cosmology. In our fitting pro- 
cedure, we work with the ratio of the non-linear to lin- 
ear power spectrum, namely R{k,z) = Pni{k,z)/Piin{k,z), 
where Piin{k,z) is calculated using CAMB (Lewis, Challinor, 



|fc Lasenby|2000[ ). As such, weighting the numerator in Eq. 13 
by cr = P]in{k, z) gives 



- EE 



t=i fe=i 

T c 



\k,z,w,lf.) - Pui{k,z,It 



Puu{k,z,It) 
= y~^ y~^ [RANNJk, z, w, IJ - R{k, z, lt)f 



(4) 



(5) 



(2) 



The ratio R{k, z) is a flatter function and gives better per- 
formance, particularly at higher redshifts where the ra- 
tio tends to 1. Given the weights w, 7?ANN(fc, 2:, w, I) in 
Eq. [5] is the network's prediction of the ratio R{k, z, I) for 
the specific cosmology I. The predicted non-linear spec- 
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trum P^i^^{k, z,w,l) in Eq. NJ is recovered by multiply- 
ing RANN{k, z,w,l) by the corresponding linear spectrum 

Plin(fe,2,I). 

We ran A''-body simulations over a range of cosmologi- 
cal parameters with the publicly available adaptive mesh re- 
fineme nt (AMR), grid-based hybrid (hydro+A ^-body) code 
ENZCpl lNorman et al.|2d07||O'Shea et al.|2010[ ). We include 
radiative cooling of baryons using an analytical approxima- 
tion ([Sarazin fc White]|1987) for a fully ionized gas with a 



metallicity of 0.5 Mq. The cooling approximation is valid 
over the temperature range from 10^ — 10^ K. Below 10* 
K, the cooling rate is effectively zero. However, we do not 
account for metal-line cooling, supernova (SN) feedback or 
active galactic nucleus (AGN) feedback. It is worth men- 
tioning here that van Daalen et al. (2011 1 have shown that 



the inclusion of AGN feedback can reproduce the optical 
and X-ray observations of groups of galaxies, and decrease 
the power relative to dark matter-only simulations at 2: = 0, 
ranging from 1 per cent at fc « 0.4/iMpc~^ to as much as 
10 per cent at fc « l/iMpc~^. As such, understanding and 
including the effects of baryonic physics in numerical sim- 
ulations will be critical to predicting the non-linear matter 
power spectrum at sub-per cent level. Further, the ANN pre- 
scription we are using in this paper could also be used for 
fitting these kinds of baryonic effects by introducing addi- 
tional parameters beyond the cosmological ones, especially 
since gasdynamical runs are much more expensive than dark 
matter-only simulations. 

In Fig. [1] top panel, we show the power spectrum for 
a cosmological model I = (0.13, 0.0224, 0.986, -1.23, 0.72, 
eV), with h = 0.8. The spectrum is evaluated at red- 
shift 2 = (upper set) and z = 1 (lower set). At each red- 
shift, the power spectrum is calculated using (i) linear the- 
ory (dot-dashed), (ii) A'^-body (dots), (iii) halofit (dash) 
and (iv) COSMIC emulator (sohd). The vertical dash hue 
at fc = 0.9hMpc~^ is the highest k upto which our A^'-body 
power spectrum is accurate at per cent level. We average 
over 10 realizations of the initial power spectrum to sup- 
press the scatter in the A'^-body results. On smaller scales, 
our numerical simulations do not have the force resolution 
to give accurate results. The ratio of the A-body spectrum 
to COSMIC emulator's prediction is shown in the middle 
(2 — 0) and bottom (2 — 1) panels. The error bars corre- 
spond to the scatter in the A-body results. 

In this method paper, we match the linear theory power 
spectrum to the non-linear power spectrum from simula- 
tions at A; = 0.09/iMpc~^. In Paper II we intend to use 
the one-Loop standard FT as implemented by |Saito et al.| 
( 2008 1 for estimating the matter power spectrum upto k < 
0.085 /iMpc~^ and stitch it with the A^-body spectrum. The 
stitched spectrum will be sampled at 50 fc-values between 
0.006/iMpc"^ < k < l/iMpc"\ In Fig. H] we show the 
stitched-and-sampled A-body power spectrum (solid dots) 
which we will use as Pni{k,z) for ANN training in Faper 
II. The HALOFIT spectrum, sampled at the same fe-values. 



http://lca.ucsd.edu/projccts/enzo 
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Figure 1. Top: matter power spectrum evaluated at redshift 2 = 
(upper set) and 2 = 1 (lower set). At each redshift, the spectrum 
is calculated using (i) linear theory (dot-dashed), (ii) A^-body 
(dots), (iii) HALOFIT (dashed) and (iv) COSMIC EMULATOR (soUd, 
see [Lawrence et al.|2010) . The cosmological parameters are: I = 
(0.13,0.0224,0.986,-1.23,0.72,0) with h = 0.8. In this method 
paper, we use the linear matter power spectrum for scales k < 
0.09/iMpc~^. Due to the lack of force resolution on small scales, 
our Af-body power spectrum is accurate at per cent level for k < 
0.9 fi,Mpc~l . Middle: The ratio of the Af-body spectrum to COSMIC 
emulator's prediction at 2 = 0. The error bars correspond to the 
scatter in the A-body results. The horizontal dotted lines denote 
it2,itl and per cent error. Bottom: The same as the middle 
panel, at 2 = 1. 



is also shown (open circles) and is used as Pni(fe,2) in this 
method paper. 

In Eq. [5] optimizing the weights w in order to minimize 
X^ generates an ANN that predicts the power spectrum very 
well for the specific cosmologies in the training set. How- 
ever, such a network might not make accurate predictions 
for cosmologies not included in the training set. This usu- 
ally indicates (i) an overly simple network architecture (very 
few hidden layer nodes), (ii) very sparsely/poorly sampled 
parameter space and/or (iii) a highly complex non-linear 
mapping that actually over-fits to the noise on the training 
dataset. In order to generate smoother network mappings 
that generalize better when presented with new cosmologies 
that are not part of the training set, a penalty term xq is 
added to the cost function x^y 



XQ = OiJ2^ 



(6) 



where Wij is the weight connecting the jth node to the ith 
node of the next layer. XQy usually a quadratic sum of the 
weights, prevents them from becoming too large during the 
training process, by penalizing in proportion to the sum. 
The hyperparameter a controls the degree of regularization 
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Figure 2. Linear theory, HALOFIT and A'^-body spectra from 
Fig. [ll left-hand panel are re-plotted - with the only difference 
that on scales k < 0.09 hMpc~^ , the HALOFIT and Af-body spectra 
are approximated by linear theory. The stitched spectra are then 
sampled at 50 fc-values between 0.006 /iMpc~^ l£ k < IhMpc"^. 
In this paper, we use the HALOFIT spectrum as Pni{fc, z) for ANN 
training. In Paper, II the A'-body spectrum will be used along 
with the one-Loop standard PT. 



is considered finished once Xc i^ minimized with respect 
to the validation set. The trained network can now be 
used to predict Pni{k,z) for new cosmologies. In practice, 
a number of networks are trained that start with an 
alternative random configuration of weights. The trained 
networks are collectively called a committee of networks 
and subsequently give rise to better performance. The final 
output is usually given by averaging over the outputs of the 
committee members. 

The ANN technique has been used successfully in em- 



pirical photometric redshift estimation with the ANNZ ( Col- 
lister fc Lahav|[2004| package. ANNZ learns an effective pa- 



rameterization of redshift as a function of broad-band photo 
metric colours by training on a representative set of galaxies 
that have both photometric and spectroscopic information. 
This has been shown to be more successful than template or 
synthetic-based methods ( Abdalla et al.|2011 Thomas, Ab 
dalla, & Lahav||2011 1. In this work we use an MLP similar 



to the original ANNZ engine. 

Our intention is to use this neural network technique 
to learn the non-linear matter power spectrum as a function 
of cosmological parameters by training on A''-body simula- 
tions. This natural fitting procedure removes both the effort 
and unnecessary potential bias that results from invoking 
ultimately imperfect sets of fitting equations such as the 
HALOFIT. As can be seen in Fig. [2] the halofit predictions 
are in error by as much as 50 per cent on small scales. As 
we will discuss in Section |4] the ANN technique is extremely 
fast and, more importantly, accurate. 



of the network's non-linearity. After having initialized a, 
its value itself is re-estimated during the training process 



iteratively. See Bishop ( 1995 1 for more details 



Thus, the overall cost function which is presented to the 
ANN for minimization with respect to the weights w is. 



Xc = '^Y1 [-^ANN (fc, 2, w. It 
t=i fc=i 



R{k,z,It)f + aJ2-> 



(7) 
To minimize the cost function xc w.r.t. the weights w, 
we use an iterative quasi-Newton algorithm which involves 
evaluating the inverse Hessian matrix 

a2 2 

d Xc 



Hi 



dv 



(8) 



"J 

Specifically, we employ the Broyden-Fletcher- 
Goldfarb-Shanno (BFGS) method to approximate the 



inverse Hessian matrix (see Bishop 1995 for details) 



Starting with randomly assigned weights w, their values 
are re-estimated iteratively, assuring that each iteration 
proceeds in a direction that lowers the cost function 
Xc- After each iteration to the weights, Eq. u\ is also 
calculated for what is known in neural network parlance 
as a validation set, in order to avoid over-fitting to the 
training set. The validation set for our application of 
neural networks, is a small set of simulations with known 
I = {Q.jnh'^,^hh?,ns,WQ,(Js„Yjm„) and Pr,i(k,z). The 
final weights w/ are chosen such as to give the best fit 
(minimum xh) to the validation set. The network training 



3 LATIN HYPERCUBE PARAMETER 
SAMPLING 

In order to fit a set of parameters optimally one strives to 
sample them as finely and as evenly as possible. However, 
a regularly spaced grid with A'' sampling intervals along 
one dimension and d parameters scales as A'''*. For a six- 
dimensional parameter space with only 10 grid intervals, 
this quickly escalates to 10® points. The problem is exac- 
erbated because an A^-body simulation is a computationally 
expensive activity. To further compound this issue, each pa- 
rameter configuration needs to be simulated over multiple 
realizations to beat down simulation (sample) variance. An 
alternate approach could be to interpolate the fitting func- 
tion over a selection of randomly distributed points through- 
out the parameter space. However, this is prone to statistical 
clustering and will lead to a degradation of the machine- 
learning fit for parameters or regions affected by it. In or- 
der to circumvent these problems, we select parameters dis- 
tributed on a Latin hypercube. 

A square grid is said to be populated as a Latin square 
if, and only if, there is exactly one sample in each row 
and each column of the square. This is illustrated clearly 
in Fig. [3] A similar sampling scheme was developed first by 
Leonhard Euler who indexed the samples with Latin charac- 
ters, motivating the name 'Latin square'. A Latin hypercube 
is a generalization of Euler's Latin square to a higher di- 
mensional parameter space and is an example of a stratified 
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Figure 3. Left: An example of a Latin hypercube distribution. Every interval dx and dj/ is sampled; however each row and column are 
sampled only once. Right: an improved Latin hypercube where the distribution is more evenly spread through the space. Each subspace 
is equally sampled and there are no voids or clusters as in the left-hand panel (bottom left and right corners, respectively). 



sampling technique. This ensures that each and every seg- 
ment/interval along a parameter axis is sampled with high 
resolution without a vast number of points. That is, one can 
sample a d-dimensional space with n simulations and have 
all parameters evaluated along every dx = {b — a)/n incre- 
ment, where b and a are the upper and lower limits of the 
parameter, respectively. Therefore, it is independent of d. 
However, a random implementation of a Latin hypercube al- 
gorithm can still lead to statistically under-sampled regions. 
An example of this can be seen in Fig. [3] Each panel shows 
a random implementation of Latin hypercube algorithm. In 
both panels, the square is partitioned into four subspaces. 
The left-hand panel has voids (and clusters) in two of its 
subspaces. The right-hand panel has each subspace equally 
sampled (while still obeying the Latin hypercube definition) 
and represents an improved Latin hypercube sampling. In 
this case the sample space is partitioned into equally prob- 
able subspaces and the variance in the pairwise separation 
of the sampled points is minimized. 

Since the introduction of the Latin hypercube sampling 
technique ( [McKay et al.|[T979[ ), the procedure has become 
common in computer science, uncertainty analysis and engi- 
neering emulation (where simulation of complex machinery 
is overwhelmingly time consuming). Similarly, variations of 
the Latin hypercube sampling technique have been imple- 
mented in cosmological analyses before, e.g., [Habib et ah] 



( f2007| ), |Heitmann et all ( [SOOQ] ) , [Schneider et al.| ( |2011[ ) a"nd 
references therein. In this paper, we use the improved Latin 
hypercube technique to set up the cosmological models to 
be used for ANN training. 



7-year -fBAO-|-i/o (Komatsu et al. 20111 constraints (see 
Table [l|. 

Throughout this paper, we only consider spatially flat 
models with the present-day CMB temperature T^,° = 
2.7257^. We also assume that all massive neutrino species 
are degenerate. The effective number of neutrino species is 
fixed at A'off = 3.04. We derive the Hubble parameter h 
using the WMAP 7-year -l-BAO constraint on the acoustic 
scale Trdis/vs = 302.54, where dis is the distance to the last 
scattering surface and Va is the sound horizon at the redshift 
of last scattering. We derive h as follows. 

(i) For a given Qi,h^ and Q,mh^ , compute the redshift of 
the last scattering surface, zis, using the fit proposed by|Hu| 



& Sugiyama ( 1996 1 



Zls 

62 



= 1048 



1 + 



0.00124 

(f7b/l2)0-738 



[n-fei(n„/i")''=] 



0.0783 [ 
0.560 



l + 39.5(n6ft')°-''T' 



i-h2i.i(ni,/i2)i-8 



(9) 
(10) 

(11) 



(ii) For a given Q,bh? , flmh^ and Yl "mjy, choose a value 
for h and compute its evolution, h{a). Here we follow section 
3.3 from Komatsu et al. ( 2011[ |, which includes the effect of 
massive neutrinos on h{a): 



3.1 Setting up an improved Latin hypercube for 
cosmological parameters 

We varied six cosmological parameters I = 

{Qmh^ , flhh^ ,ns, Wo, OS, Yl '^i' ) between the limits spec- 
ified in Table [l] The limits on this six-dimensional 
parameter space are chosen so as to include the WMAP 



h{a) = 



F{y) 






1 + 



ih 



= A^cff 



120 



7/4 



8 V 11 



4/3 



Fiy) 



77r4 



x^^Jx^ + y2 



dx. 



+ 



Q.I. 



,3(1+11)0) 



(12) 

(13) 
(14) 



e^ -f- 1 
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Table 1. The six cosmological parameters and their ranges, used in generating the ANN training and validation sets. This six-dimensional 
parameter space is sampled using the improved Latin hypercube technique (see text for details) . The last column shows the corresponding 
WMAP 7-year+BAO-|-i?o constraints at 68 per cent CL. 



Cosmological parameters 



Lower value Upper value WMAP T-year+BAOH 



WQ 

Em, (eV) 



0.110 


0.165 


0.1352 ± 0.0036 


0.021 


0.024 


0.02255 ± 0.00054 


0.85 


1.05 


0.968 ± 0.012 


-1.35 


-0.65 


-1.1 ± 0.14 


0.60 


0.95 


0.816 ± 0.024 





1.1 


< 0.58'' 



Note. 



Komatsu et al. (2011 1; 95 per cent CL for w = — 1. 



y 


= 




rp 


= 


[n) ^- 


o 




2.4706 X 10"^ 



h2 



T-, 



2.725 



Ti/" is the present-day neutrino temperature and Q-y is 
the present-day normalized photon energy density. Given 
^ m,, the function F{y) calculates the contribution of neu- 
trinos to the radiation energy density at scale factor a. 

(iii) Using h{a) from step (ii), compute the comoving 
sound horizon rs{z) at the last scattering redshift zu: 



rs{zis) = 



i/{i+z,J 



V3 Ja=0 



a'^h{a) yr+~(3tV4n^ ' 



(15) 



(iv) Using rs{zis) from step (iii), together with the 
WMAP 7-year-|-BA0 constraint on the acoustic scale 
Trdis/rs — 302.54, compute the comoving distance to the 
last scattering surface, dis'. 



302.54 , 

iis = rs(zis 



(16) 



(v) Using h{a) from step (ii), compute the comoving 
distance to the surface of last scattering x{zis)'- 



Xizis) 



da 



1/(1+^;.) 



2ft(a)' 



(17) 



(vi) Compare results from steps (iv) and (v). Minimize 
the difference \dis — x{zis)\ by varying h in step (ii) and 
re-estimating steps (ii)-(v). 

Using Table [T] as the parameter priors, we sampled this 
six-dimensional parameter space with an improved Latin hy- 
percube technique. We generated 130 cosmologies to be used 
as the ANN training set and another 32 cosmologies for the 
validation set. We show the training set (upper triangle) and 
the validation set (lower triangle) in Fig. [4] 

As can be seen in Fig. [3l a major advantage of im- 
proved Latin hypercube sampling technique is the relatively 
uniform coverage it provides. This is, of course, highly use- 
ful for training a machine-learning algorithm. As with any 
interpolation mechanism, one hopes that the neural network 
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Figure 4. Upper triangle: ANN training set with 130 viable 
cosmologies, in a six-dimensional parameter space. Lower tri- 
angle: ANN validation set with 32 viable cosmologies, in a six- 
dimensional parameter space. See Tablelllfor the parameter priors 
used to generate the training and validation sets. 



can generalize from what it has learned to new and slightly 
different input data (in this case cosmological parameters). 
In reality, the response will be uncertain in poorly trained 
areas. Therefore, the caveat with our sampling will reside 
near the edges of the parameter hypercube. A parameter 
value that we might want emulated may not be encapsu- 
lated within the hypervolume of a simulated, and therefore 
trained, point. This can be understood with reference to 
Fig. [4] The performance of a neural network can severely de- 
grade near the parameter boundaries. The solution is simply 
to choose prior ranges that are marginally wider than those 
of real interest. The allowance could easily be found em- 
pirically by projecting the hypercube realizations. The real 
problem in cosmology therefore arises when one has a pa- 
rameter that is physically bounded, an example being the 
neutrino mass Yl ^^v ^ 0. 
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Figure 5. Upper triangle: extending the ANN training set (upper 
triangle in Fig.ffl with 70 cosmologies with J^ m^ = 0. Lower tri- 
angle: extending the ANN validation set (lower triangle in Fig. Hi 
with 18 cosmologies with J^ rui, = 0. 



Adding several additional simulations at the parameter 
boundary may not be a computationally feasible solution 
to the problem due to the multi-dimensionality of the pa- 
rameter space. Instead we propose to use a nested hyper- 
cube with 6—1 = 5 dimensions. We fixed Yl '^i' = and 
varied the rest of the parameters over their aforementioned 
limits. We extended the ANN training and validation sets 
to include this five-dimensional hyperplane. Towards this, 
we generated 70 (for training) and 18 (for validation) cos- 
mologies with ^rrii, = 0. Fig. p] shows the five-dimensional 
hyperplane. 



4 ANALYSIS AND RESULTS 

We tested the precision with which our neural network 
can predict the non-linear matter power spectrum. We se- 
lected the combination 7 : Nhidden '■ 50 as our network 
architecture, where Nhidden (number of nodes in the hid- 
den layer) was varied from 7 to 56, in steps of 7. The 
number of inputs were fixed at 7, corresponding to I = 
[Q.uihF' ,^hh^ ,'ns,uio,(jg,,Yrn^) including redshift z. Note 
that we do not sample the redshift in the Latin hypercube 
but instead evaluate Pni{k, z), at 111 redshifts between z — 
and 2 = 2, using the existing prescription halofit coupled 
with the CAMB software. These HALOFlT-generated spectra 
serve as mock A'^-body spectra. We sampled Pni{k,z) at 50 
points between 0.006 /iMpc~^ < fc < l.O/iMpc"^. Since our 
training and validation sets have (130-1-70) and (32-1-18) cos- 
mologies, respectively, we calculated Pni{k,z) for each cos- 
mology, at 111 redshifts. These Pni{k, z) are scaled by their 
respective linear spectra P\i-n{k,z), before being fed to the 
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Figure 6. Left-hand column: percentage error at redshift 2 = 0, 
between the predicted non-linear power spectrum (using ANN) 
and the true underlying mock (using halofit) for 200 training 
set cosmologies shown in the upper triangles of Figs. |4] and Is] 
The profile is continuous as the 50 output values have been spline 
interpolated over the functional range. The rows (from top to 
bottom) correspond to Nf^i^^^^ = 28 — 49 in increments of 7. The 
mean error over all 200 cosmologies is shown by a dashed line 
- an indicator of any bias in the ANN training scheme. Middle- 
column: the same as the left-hand column, at redshift z = 1. 
Right-hand column: the same as the left-hand column, at redshift 
z = 2. 



neural network. Thus, the overall size of the training set that 
we train our ANN with is 200 x 111 = 22, 200. Likewise, we 
have 50 x 111 = 5,550 patterns in the validation set. We 
trained a committee of 16 ANNs at each Nhtdden setting. 
The weights w for each ANN were randomly initialized (the 
random configuration being different for each ANN). The 
weights are allowed to evolve until Xc (^^e Eq. l7| is mini- 
mized with respect to the cosmologies in the validation set. 
In Fig. [6] we show the performance of the trained ANNs 
with varying Nhidden units, when presented with each of 
the 200 cosmologies in the training set. Note that we aver- 
age the P^i^^ {k,z) predictions over all 16 ANN committee 
members. The rows correspond to Nhidden = 28 — 49 (from 
top to bottom) in increments of 7. The columns (from left 
to right) correspond to 2 = 0, 1, 2. The mean error over all 
200 cosmologies in the training set is shown by a dashed 
line in each panel, to get an idea about any systematics in 
our ANN training scheme. With Nuidden = 49, the ANN 
predictions at redshifts 2 = and 2 = 1, on a^ scales, are 
within ±1 per cent of the halofit power spectra. Although 
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we show results a,t z — and 2 = 1, we have checked that 
the predictions are 1 per cent level for all 2 < 1. Predic- 
tions are at the 1 per cent level even up to redshift 2 = 2 
for k < 0.7/iMpc~'^, after which the performance degrades 
to ±2 per cent. We have checked and confirmed that the 
worst-performing cosmologies correspond to the parameter 
settings in which at least four of the six cosmological pa- 
rameters are at their boundary values. 

As we mentioned earlier, this fitting procedure will be 
less accurate near the boundaries of the parameter ranges 
because some parameter configurations may not be encap- 
sulated within the volume of a training point. This also 
explains why the ANN performance is better at z = 1 - 
the mid-point of the redshift range. Looking at the bias 
(dashed line in Fig. [m , we see that the distribution of er- 
rors in the ANN predictions is centered on zero, indicating 
that our interpolations are not biased. A negligible bias, and 
the fact that for every cosmological setting within the pa- 
rameter priors (see Table [I]) the nori- linear power spectrum 
at 2 < 2 is correctly predicted within ±1 per cent up to 
k < 0.7/iMpc~^, demonstrates the stability of our ANN 
strategy. This marks a remarkable improvement over the 
currently popular interpolation scheme - the COSMIC emu- 
lator, which has a significant number (~ 50 per cent) of 
cosmological models with errors at ~0.5 — 1 per cent level. 
We note, however, that the COSMIC emulator, based on 
Gaussian processes, is able to achieve sub-per cent accuracy 
with only 37 distinct cosmologies while in the ANN scheme 
we use a suite of around 200 cosmologies. Comparing Fig. 
9 from Heitmann et al. ( 2009 1 with our Fig. Im we see that 
the ANN implementation performs better on all scales and 
redshifts. 

In order to check the performance of our trained ANNs 
over parameter configurations that do not touch the Latin 
hypercube, we generated a testing set of 330 cosmologies (of 
which 150 have ^rrii, — 0). See Table [2] for the parameter 
limits of the testing set. 

A testing set serves another crucial purpose. Increasing 
the number of nodes in the hidden layer increases the flexi- 
bility of a neural network. An increasingly complex network 
can make extremely accurate predictions on the training set. 
This is evident from Fig. |6] where the prediction over the 
training set becomes progressively better (from top to bot- 
tom) with increasing Nhtdden units. However, such complex 
networks can adversely affect their generalizing ability when 
presented with a new dataset. Measuring the performance 
of a neural network on an independent dataset (known as a 
testing set) as a function of Nhiddan units helps in control- 
ling its complexity. We show the testing set in Fig. [TJ with 
the lower triangle corresponding to the 150 cosmologies with 
X]mi, = 0. 

The performance of the trained ANNs as a function 
of N hidden units, over the cosmologies in the testing set, is 
shown in Fig.lS] Increasing N hidden from 42 to 49 reduces the 
error marginally. We have checked that increasing Nhidden 
beyond 49 does not contribute to any further error reduction 
on the testing set, indicating that Nhidden = 49 saturates 
the generalizing ability of the network. With Nhidden ~ 49, 
the ANN prediction for every cosmology, on all scales, at 
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Figure 7. Upper triangle: ANN testing set with 180 cosmolo- 
gies with J^ m^ > 0. Lower triangle: extending the ANN testing 
set with 150 cosmologies with J^ m„ = 0. See Table [2] for the 
parameter priors used to generate the testing set. 



redshifts z < 1, is within ±0.5 per cent of the halofit 
power spectra. For 1 < 2 < 2, in the high k range, the per- 
formance degrades slightly to ±0.8 per cent. As expected, 
the ANN performs exceedingly well away from the bound- 
aries of the parameter ranges. It is quite remarkable that 
our ANN scheme is capable of making predictions at sub- 
per cent level, especially on the testing set that is not a part 
of the ANN training process. 

In Fig. [9] left panel, we show the non-linear power spec- 
trum at redshifts 2 = and 2 = 1 predicted by our trained 
ANN (solid dots) for the same cosmology that was used 
to generate the halofit power spectrum of Fig. [2J namely 
I = (0.13, 0.0224, 0.986, -1.23, 0.72, eV) with h = 0.8. 
Nhidden is fixed at 49, as discussed above. For comparison, 
the halofit power spectra are re-plotted as starred points. 
The prediction errors at 2 = and 2 = 1 are shown in the 
middle and the right-hand panels, respectively. The ANN 
predictions are well within ±0.5 per cent over the scales of 
interest. 

We reiterate that this method for reconstructing the 
non-linear power spectrum will only function for the pa- 
rameters and ranges that have been simulated and trained 
with PkANN. The intention of this study is to provide a 
technique for high precision fits in the concordance model 
for the oncoming generation of surveys. This should there- 
fore act as a safety mechanism as it demonstrates that the 
range of validity has been breached, as often occurs with 
blind application of other fits. In Paper II, we will put our 
ANN formalism to test using matter power spectra calcu- 
lated from TV-body simulations. On mildly non-linear scales 
(0.1/iMpc~^ < k < 1.0/iMpc~^) the power spectrum from 
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Table 2. The six cosmological parameters and their ranges, used in generating the ANN testing set. This six-dimensional parameter 
space is sampled using the improved Latin hypercube technique (see the text for details) . The parameter ranges are chosen so as to avoid 
the boundaries of the parameter space. See Table IT] for the parameter boundaries. Note that the lower bound on neutrino mass is still 
set at zero, since neutrinos are physically bound (X]"^i^ ^ 0). The last column shows the WMAP 7-year-|-BAO+-ffo constraints at 68 
per cent CL. 



Cosmological parameters 



Lower value Upper value WMAP 7-year+BAOH 



WQ 

Em, (eV) 



0.120 


0.150 


0.1352 ± 0.0036 


0.022 


0.023 


0.02255 ± 0.00054 


0.90 


1.00 


0.968 ± 0.012 


-1.15 


-0.85 


-1.1 ± 0.14 


0.70 


0.85 


0.816 ± 0.024 





0.50 


< 0.58'' 



Note. "^Komatsu et al. 12011 1; "95 per cent CL for w = -1. 
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Figure 8. The same as Fig. [6] using 330 testing set cosmologies 
shown in Fig. W\ 



A'^-body simulations is expected to vary smoothly as a func 
tion of cosmological parameters ( Heitmann et al.|2006 2009 



Agarwal fc Feldman|20lTL As such, our ANN interpolation 



scheme that we have shown here to work well on halofit 
power spectra is expected to perform satisfactorily on A''- 
body spectra as well. 



5 CONCLUSIONS 

The advent of the era of precision cosmology poses an im- 
mense challenge to theoretical physics. The upcoming gen- 
eration of surveys has the potential to breach per cent level 



of accuracy. Such high-precision data will improve our con- 
straints on cosmological parameters including dark energy 
and neutrino masses. Efficiently dealing with this impending 
flood of precise data on ever smaller scales and lower red- 
shifts requires that we move on from linear theory as well as 
any imperfect sets of fltting equations. Although numerical 
simulations are capable of achieving the levels of precision 
required by the near-future surveys, the high dimensional- 
ity of the cosmological parameter space renders their brute 
force usage impractical. 

We have introduced a unique approach to coping with 
non-linearities in the matter power spectra in cosmology. By 
employing a multi-layer perceptron neural network together 
with an improved Latin hypercube parameter sampling tech- 
nique, we have demonstrated that the non-linear spectrum 
can be reconstructed from a full set of A cold dark matter 
parameters to better than 1 per cent over the parameter 
space spanning 3a confldence level around the WMAP 7- 
year central values. Parameters that are likely to reside by 
some hard physical prior, such as the neutrino mass, can be 
successfully brought under the realms of ANNs by sprinkling 
of extra simulations in the corresponding {e.g. ^rrii, = 0) 
hyper-plane. 

Looking forward, our ANN procedure can be read- 
ily employed for a variety of cosmological tasks such as 
fltting halo mass functions obtained through high resolu- 
tion A'^-body simulations. Moreover, mixed datasets such 
as the matter power spectra and the halo mass functions 
can be combined and presented to a neural network as the 
training set. An ANN trained with such a heterogeneous 
dataset would be capable of cosmological parameter estima- 
tion when presented with the combined observations of the 
matter power spectrum and the measured halo mass func- 
tion. The implementation of our technique avoids complex 
calculations and, through the execution of only the neural 
network weights, is extremely fast (predictions take less than 
a second). An automated PkANN function will be released 
with our program of A'^-body results in a companion pa- 
per shortly. Beyond this we hope that with our method a 
collaborative effort could reduce non-linear error to only un- 
certainty in the A^-body simulation's baryon interactions. 
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Figure 9. Left: ANN predictions (solid dots) at ^ = (upper) and 2 = 1 (lower) for the HALOFIT power spectra of Fig. p\ (re-plottcd 
here as starred points - difficult to see due to the excellent ANN predictions). The cosmological model is I = (0.13, 0.0224, 0.986, -1.23, 
0.72, eV) with h = 0.8. Nf^i^^f.^ is fixed at 49, as discussed in the text. The ratio of ANN predictions to the HALOFIT spectra is shown 
in the other panels. Middle: percentage error in ANN predictions, at 2 = 0. Right: percentage error in ANN predictions, at 2 = 1. The 
ANN predictions are within ±0.5 per cent over the scales of interest. 
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